Tuesday, January 10, 2012

LINQ sorting is also vulnerable

This is a follow-up post to .Net's Sort Is Not Secure. Don't Use It. Here's a Better One.

A reader asked if LINQ's sort has the same vulnerability. I added tests for sorting an array using LINQ's 'orderby' clause. Another reader wondered if the Sort() method for generic lists had the same problem so I added that test as well.

Time required for sorting one million items provided by an adversary:

AlgorithmSorting Time
Zimbry Introsortone-half second
.Net CLR (Array)1,588 seconds
.Net CLR (LINQ)1,916 seconds
.Net CLR (List)1,588 seconds

Detailed benchmark results are here: Sorting up to 1 million items (with an adversary)

Notes on the test:

All tests were run under .Net framework version 4.0

The times reported in the table are an average of the times taken for one ascending sort and one descending sort.

Friday, January 6, 2012

Quicksort: Is picking a pivot at random secure?

Eric offered this suggestion:
Another way to harden quicksort is to select the elements used in the pivot calculation randomly. This makes it nearly impossible for an attacker to cause worst case performance, and doesn't require you to also implement multiple sort algorithms.
This is well-worth considering. A single sort algorithm is simpler than combining multiple algorithms so there has to be a compelling benefit gained to justify the additional complexity.

I've done some testing and benchmarks but before we jump to the conclusion take a few minutes to ponder this with me and let's see just how far down this rabbit hole goes.

Hardening quicksort is a thorny problem and for an interesting reason. Quicksort's soft spot is in the method used to estimate the median. The risk that a sequence of partitionings will be unbalanced and lead to degenerate performance depends directly on the quality of the estimate. So why bother with a questionable estimate when it's easy to pick the actual median and have a perfectly secure quicksort? Ah, because there's a catch. Finding the actual median requires almost as much effort as it would to perform the sort itself. So, the only way that quicksort is viable is by making estimates and, as such, most quicksort algorithms are defined by their estimating technique: first item, random item, median-of-three, median-of-N, median-of-random-three, median-of-medians-of-three... you name it and it's probably been tried.

As long as we are estimating the median we have to accept that we are, in effect, rolling the dice and hoping a rare series of bad choices does not happen. And if an attacker is present they are only too happy to hand us loaded dice to ensure the worst possible outcome.

But wait... For an attack to work the attacker must be able to anticipate our method of estimating a median. What happens if we choose it at random? Does this defeat the attacker?

First, what exactly do we mean by 'random'? The pseudo-random number generator provided by most frameworks (the Random class in .Net is a good example) is highly predictable. It won't slow down an attacker. Alternatively, you could use a cryptographically secure random number generator but that comes at a performance penalty that almost certainly makes it unworkable. That cost will come down soon, though. Intel has new CPUs coming this year that offer fast, high-quality random number generation in hardware.

Let's assume we have that CPU today and it delivers fast and cryptogaphically secure random values. We are probably(1) not at risk from an attacker.

Not so fast. There are still two problems we can't erase:

1. Although highly unlikely, it doesn't eliminate the risk of a series of low-probability but catastrophic choices. I have seen quicksorts go quadratic on natural data, without the presence of an attacker, even when using the median-of-three method. Choosing at random should be more resilient. Whether this risk is acceptable or not depends on whether the sort is part of tax preparation software or whether it's running the auto-pilot of a jumbo jet.

2. It makes low-quality estimates of the median. Partitions will be less balanced and performance will suffer for it.

There is only one way I know of to reliably harden quicksort against both an attacker and an unlucky series of choices: Set an upper limit on how much effort quicksort is allowed to spend before abandoning it entirely.

Introsort is secure because it limits how many bad partitions it can accept before dropping out of quicksort and switching to heapsort to finish the job.

Of course, we could throw away all the complexity and just use heapsort. It has no degenerate cases and is secure against an attacker. But there's a catch. The reason we don't embrace heapsort so quickly is it tends to be significantly slower than quicksort in the average case.(2)

Let's look at the benchmarks: Sorting up to 100 million items (with no adversary)

The final algorithm, Zimbry_PureQuicksortFatPivotR1, is a pure quicksort (does not fall back to an insertion sort on tiny partitions) that uses a pseudo-random number generator to choose the pivot. It's what Eric suggests: A single quicksort algorithm that chooses a pivot at random. Introsort is 19% faster on average and is not vulnerable to either an attacker or at risk of degenerating on a bad series of random pivot choices.

So, here we are, at the bottom of the rabbit hole, and we find ourselves surrounded by trade-offs:

1. We can have a pure quicksort if we're willing to accept slower performance and the highly unlikely chance of bad behavior.

2. We can be invulnerable to both attackers and bad luck by choosing heapsort but that armor comes at a severe average performance cost compared to quicksort.

3. We can "have our performance cake and eat it too" by building a quicksort that falls back to heapsort in an emergency. We get good performance on average and no bad behavior to worry about but it comes at the cost of additional engineering complexity.

Engineering is all about trade-offs.

                                                                                                                                                            

(1) A motivated attacker won't give up so easily. If there is a flaw in Intel's random number generator it will be found and exploited. It's best to give the community (both the good guys and bad guys) time to stress-test any new security software or hardware before adopting it for your own use.

(2) Ball-park comparison: A well-implemented heapsort is typically three to five times slower than a well-implemented quicksort.

Thursday, January 5, 2012

Vote for fixing .Net's sorting security vulnerability


Several people suggested reporting this to Microsoft. Good idea. If you feel it is important to fix this you can vote for it at the following link:

Bug: .Net's sort is not secure and is vulnerable to an attacker who can use it to create a DOS attack


Tuesday, January 3, 2012

.Net's Sort Is Not Secure. Don't Use It. Here's a Better One.

.Net's Array.Sort (up to at least version 4.0) has serious weaknesses:

1. It is insecure and using it makes you vulnerable to a malicious attacker. .Net uses an ordinary quicksort with the pivot selected by the median-of-three method. It is easy to provoke quicksort's worst-case (quadratic) behavior and increase running times by multiple orders-of-magnitude. An attacker will be happy to exploit this as an effective denial-of-service attack.

2. It is inflexible. It does not allow you to provide a delegate for the swap function so sorting data structures where data synchronization is required to maintain consistency as items are moved is impossible. Also, you can only sort items on the .Net heap so sorting unmanaged memory is impossible.

3. It is slower than it should be even in the absence of an attacker.

Zimbry.Introsort addresses each of these problems.

1. It is secure. It is based on David Musser's Introsort algorithm. Introsort is essentially a quicksort that, should it fail, falls back to a secure heapsort.

2. It is flexible. Both the compare and swap operations are provided by the user. You can use it to sort anything.

3. It is faster. This wasn't an explicit objective but it's nice that we don't have to trade away performance to get a secure and flexible sort.

Click the links to see the benchmarks:


Let's look at the worst-case of dealing with an adversary.

It takes .Net over 26 minutes to sort one million integers when they are provided by an adversary. Zimbry.Introsort does it in half a second.

Those are the worst-case results. We can disable the adversary and benchmark it again:


Zimbry.Introsort is twice as fast in the average case and rarely less than 13% faster in any case.

(Each test was run only once so the timings for small arrays contain noticeable sampling noise. A more robust benchmark would filter multiple samples.)

I am releasing the source under the MIT license: Click here for the source

Some notes on the source:

You'll find many alternative sort algorithms in the Zimbry.Sort.OtherSorts project. I experimented with these along the way. You can enable them in the benchmark if you have a great deal of patience.

The class in QuicksortAdversary.cs was derived from Doug McIlroy's paper, A Killer Adversary for Quicksort. Be careful. It will beat up quicksort and steal its lunch money.

Zimbry.Introsort contains four sort algorithms layered together:

1. Quicksort with pivot selected by median-of-nine: For large partitions.

2. Quicksort with pivot selected by median-of-five: For small partitions.

3. Heapsort as a fall-back when quicksort recurses too deep: Heapsort is slower than quicksort in the best case but it has no quadratic behavior to exploit so it provides effective protection against an adversary.

4. Insertion sort: For tiny partitions where quicksort is inefficient.

Using these four algorithms lets us enjoy the performance advantage of quicksort for the typical case with protection against a malicious attacker in the worst case.

Both quicksorts use Bentley & McIlroy's "fat-pivot" partitioning method from their paper, Engineering a Sort Function, for better performance. This is a big part of why it performs better than .Net's quicksort in many tests.

While this is an improvement it is far from the last word in sorting. Some ideas to consider:

Better performance may be found with Vladimir Yaroslavskiy's dual-pivot quicksort.

It really needs special versions for handling known data types (avoiding the requirement for using compare and swap delegates in all cases). This would give a significant speed improvement.

There's more room for performance tuning. I tried to leave the code in a fairly readable state and some sacrifices could be made to buy a little more performance.

It would be nice to add support for stable sorting.