Halfbakery: Tournament Sort

I came up awhile ago with a sorting algorithm which uses slightly fewer than n lg n comparisons while having the advantage that the results are output in order (i.e. the algorithm finds the first item, then the second, etc.) The one (major!) weakness is that the amount of "bookkeeping" required is substantial. Nonetheless, if one has a lot of data to sort in a situation where comparisons are expensive and where getting the items served up "in order" is useful (e.g. if one can start doing something with the data once it starts coming out) the algorithm may be of some use.

Each data item must have associated with it three pieces of information: a 'weight' value (discussed below), status (active or inactive), and a list of data items it has "beaten" (also discussed below). A heap, tree, or chain-bucket data structure should be used to allow data items to be read out in the approximate order of weight.

Initially all items are active and assigned a weight of one. The outside loop of the sort procedure is then:

- Examine active items pairwise, in order of "weight". In each pair, the item that should sort earlier should have the other item's weight added to its own; the item which should sort later should be marked inactive and added to the first item's list of "vanquished" data items, but have its weight unchanged. This procedure should be run until there is only one active item.
- The remaining active item is output and all items it had "vanqushed" are returned to active status. It is then permanently deleted from the list.

The first pass through the main loop, all data items will be active; since each comparison eliminates one, the first pass of the outer loop will require n-1 comparisons before yielding the first item output.

The second pass through the main loop will be much faster, since only those data items which had been involved in comparisons with what turned out to be the final item remaining in the first pass will be active. There will be roughly lg n of those, so the second pass will require roughly lg n comparisons.

The third pass will be even shorter, though exactly how much so will depend upon which data item from the second pass was the "winner".

I have run this algorithm with a 1,000,000 item data set and found that it can be implemented reasonably effectively. The "bookkeeping" is not totally unreasonable, though it is a bit complex. As to whether there are any situations where the cost of comparisons is sufficient to justify it, I have no idea. But perhaps someone here might.