Friday 10 August 2012

C#, F# Algorithm: Faster than QuickSort?

Status: Completed
Competition start date: 2012.08.10
Competition end date: 2012.08.24
Download associated files (Version 1.5 with results)

Please make sure to read the Welcome page for general competition guidelines.
If you would like to participate in this competition please register an account on vWorker.com

Update:

2012.08.10: Each competitor is allowed 5 entries.
2012.08.11: Notes on SD at the end of the post.
2012.08.12: More notes on SD at the end of the post.
2012.08.17: New ranking method explained at end of post.
2012.08.30: Tentative Leader Board posted. Follow the discusion on GoogleGroups 
2012.08.30: Updated download files with code for each participant
2012.10.03: Winner is "Zaher". Congratulations! The source code is now available on github

Final Leader Board:

1 - Zaher, AvgRatio: 18.167, SD: 0.804, WorstCase: 16.136
2 - Renze, AvgRatio: 16.415, SD: 0.036, WorstCase: 16.343
3 - V_Tom_R, AvgRatio: 14.073, SD: 0.114, WorstCase: 13.919
4 - SoftwareBender, AvgRatio: 12.829, SD: 0.058, WorstCase: 12.739
5 - c6c, AvgRatio: 12.526, SD: 0.069, WorstCase: 12.380
6 - ErwinReid, AvgRatio: 11.452, SD: 0.036, WorstCase: 11.372
7 - Mihai, AvgRatio: 10.425, SD: 3.642, WorstCase: 5.725
8 - MoustafaS, AvgRatio: 10.024, SD: 0.186, WorstCase: 9.761
9 - aus1, AvgRatio: 7.972, SD: 0.186, WorstCase: 7.489
10 - bawr, AvgRatio: 5.317, SD: 0.240, WorstCase: 4.835

There is quite a lot of difference between the tentative leader board and the final leader board due to the way the various submissions where run. Thanks go to Renze for pointing this out and providing the code to fix it.

Final tentative Leader Board:

1. bawr - AvgRatio: 23.303, SD: 0.062, WorstCase: 23.201
2. Zaher - AvgRatio: 17.245, SD: 0.556, WorstCase: 16.822
3. c6c - AvgRatio: 11.495, SD: 0.413, WorstCase: 10.428
4. SoftwareBender - AvgRatio: 10.259, SD: 0.033, WorstCase: 10.204
5. ErwinReid - AvgRatio: 9.879, SD: 0.052, WorstCase: 9.785
6. V_Tom_R - AvgRatio: 9.713, SD: 0.022, WorstCase: 9.666
7. Mihai - AvgRatio: 9.656, SD: 0.097, WorstCase: 9.425
8. Renze - AvgRatio: 9.219, SD: 0.303, WorstCase: 8.438
9. MoustafaS - AvgRatio: 9.069, SD: 0.368, WorstCase: 8.362
10. aus1 - AvgRatio: 5.025, SD: 0.025, WorstCase: 4.961

New Leader Board:

1. Zaher Al-Sibai - ratio: 13.846 (SD: 0.075) (Worst case: 13.742) entry 1
2. Renze - ratio: 11.870 (SD: 0.081) (Worst case: 11.768) entry: 5
3. bawr - ratio: 10.213 (SD: 0.070) (Worst case: 10.088) entry 1
4. Software_bender - ratio: 9.738 (SD: 0.068) (Worst case: 9.601) entry 2
5. Renze - ratio: 9.129 (SD: 0.074) (Worst case: 8.994) entry: 4
6. MoustafaS  - ratio: 9.107 (SD: 0.096) (Worst case: 8.996) entry: 4
7. Erwin Reid  - ratio: 7.957 (SD: 0.113) (Worst case: 7.787) entry: 2
8. aus1 - ratio: 4.956 (SD: 0.085) (Worst case: 4.765) entry: 4
9. V_Tom_R - ratio 3.643 (SD: 0.062) (Worst case: 3.498) entry: 2

Old Leader Board:

1. Software_bender - ratio: 7.165 (SD: 0.130) (Worst case: 7.008) entry 2
2. MoustafaS  - ratio: 7.085 (SD: 0.113) (Worst case: 6.950) entry: 3
3. Renze - ratio: 6.548 (SD: 0.063) (Worst case: 6.435) entry: 3
4. Renze - ratio: 5.907 (SD: 0.156) (Worst case: 5.731) entry: 2
5. aus1 - ratio: 4.100 (SD: 0.145) (Worst case: 3.891) entry: 4
6. MoustafaS - ratio: 3.833 (SD: 0.094) (Worst case: 3.744) entry: 2
7. V_Tom_R - ratio 3.392 (SD: 0.108) (Worst case: 3.203) entry: 2
8. V_Tom_R - ratio 3.271 (SD: 0.129) (Worst case: 3.164) entry: 1
9. Software_bender - ratio: 2.754 (SD: 0.121) (Worst case: 2.658) entry 1
10. aus1 - ratio: 2.537 (SD: 0.080) entry: 3
11. aus1 - ratio: 2.475 (SD: 0.076) entry: 2
12. MoustafaS - ratio: 2.396 (SD: 0.086)
13. Renze - ratio: 2.340 (SD: 0.074)
14. aus1 - ratio: 2.04 (SD: 0.11) entry: 1

Introduction:

The purpose of this competition is to extract the start and end times from a list of DateRanges (which has already been pre-sorted) as a sorted DateTime list.

A DateRange can be thought of as a period of time that has well defined start and end time. (Please have a look at the class Orcomp.Entities.DateRange to get a better understanding of how this class works.)

The DateRange class implements the IComparable Interface, such that the DateRanges with earliest start times come first.

If two DateRanges have the same start time, then the DateRange with the shortest duration comes first.

A pictural representation of a sorted list of DateRanges may look like this: (Each line represents a DateRange start and end time.)

+--------------------------------------+
          +--------+
          +-------------+
          +---------------------+
                +--+
                    +---------------------------------------------------------+
                               +------+

The simplest way to get a sorted list of DateTimes from a list of DateRanges is to use the following code:


In the example shown above the Sort() method is used to order the DateTime values. Internally Sort() uses a QuickSort implementation.

This approach does not take advantage of the fact that we already have a list of pre-ordered DateRanges.

Therefore the aim of this competition is to implement the GetSortedDateTimes() method in the Orcomp.Extensions.DateRangeCollectionExtensions.cs file, taking this fact into account.

The implementation must also pass the unit test defined in the Orcomp.Tests project defined in the DateRangeCollectionExtensionsTest.cs file.

Finally once you have an implementation that works you can run the Orcomp.Benchmarks project to find out if your code runs faster or slower than the standard implementation.

Ratios greater than one indicate your code runs faster than the standard implementation.

Rules:

- There are none, as long as your code passes the unit test, it is a matter of coming up with the fastest implementation.
- Your code will be benchmarked against the standard algorithm defined in the Orcomp.Benchmarks project.
- The fastest code wins.

Notes:

- A ranking of submitted solutions will be kept up to date at the top of this post. If you want to stay anonymous please let me know when you submit your code.
- Please only submit the DateRangeCollectionExtensions.cs file.
- As a guide the standard deviation after running the benchmark should be less than 0.15.
- For F# users please send me an F# file with the implementation in it.

Notes on SD:

- The Standard Deviation (SD) is only a number which indicates the consistency or reliability of your benchmark run. If the the time result for all 20 iterations of the benchmark are very similar then the SD will approach 0. If, on the other hand the time results vary a lot, the SD will approach 1.
- The SD is influenced by other processes running on your computer at the time you run the benchmark. You will see you will get a different SD value every time you run the benchmark, so just run it as many times as you need to get an average with an SD less than 0.15. (The lower the SD the better.)
- The average ratio value that people are getting on their computer is different from what I am getting on my computer. I am not sure why, but please do not be alarmed about this. Towards the end of this competition I will be comparing the top entries against each other.

More notes on SD:

- Even though it is not recommended collecting the garbage in general, it does help to do so before each stopwatch.Reset() in order to obtain more consistent results.
- As the code being submitted is getting faster and faster it is harder to keep the SD below 0.15, therefore the worst case ratio is also included in the ranking to give an appreciation of the  spread between the times of the various runs.
- I have also noticed that the ratios fluctuate most at the beginning of a benchmark and then become a lot more reliable towards the end, therefore I might change the calculation to only include the last 10 loops in the benchmark.
- Another approach may be to forget the SD and accept a benchmark such that:
worst case ratio > average ratio * 0.95
best case ratio < average ratio * 1.05
(i.e. we have a 5% window.) This is short of doing a full blown confidence or tolerance interval analysis, but is probably good enough for what we trying to achieve.

New ranking method:

The benchmark for 1,000,000 date ranges is run 20 times.
The QuickSort implementation times are recorded for each loop and the last 10 results are selected. Outlayers are removed and the average is calculated.
The competitors implementation times are also recorded for each loop and the last 10 results are selected. Outlayers are removed and the ratio (using the average value for the QuickSort) is calculated.
The Benchmark is run in detached debugger mode (CRL - F5).
The Benchmark data has NOT changed.