O(n log n)
Hypothetically, let us say that you are a Famous Computer Scientist, having graduated from the Famous Computer Scientist School in Smellslikefish, Massachusetts. As a Famous Computer Scientist, people are throwing money at you hand over fist. So, you have a big pile of checks. (You remember checks, little slips of paper with signatures? They represent money.)
You also have a problem: A shiny new Boeing 747 has caught your eye and you need to know whether your pile of checks represents enough money to pay for it. That problem is easy to solve, since all you need to do is to add up the amounts of the checks. But you also have a more pressing problem: Do you have enough time to sum up your checks before the 747 dealer closes for the day?
Being a Famous Computer Scientist, you know something that few others know: that you have to sort the checks, smallest to largest, before you can add them up.
A digression: That is the lowest form of computer science humor, a somewhat obscure reference to a painful fact about computing hardware. In this case the problem is that numbers such as $3141.59, $100000.00, and $2.57 are represented by the computer as floating point numbers. Floating point numbers are effectively represented in scientific or exponential notation, such as 3.14159 * 10^3, 1.0000000 * 10^5, or 2.57 * 10^0. The computer, however, only allocates a certain number of digits for each number, as if instead of using one digit to the left of the decimal point in the exponential notation followed by further digits to the right, it only had 3 (or 6 or 8) to the left of the decimal and no digits to the right. That would mean that 3141.59 would be represented in the computer as 314*10^1, 100000.00 as 100*10^3, and 2.57 as 257*10-2.
Adding floating point numbers is a multi-step process: the computer first adjusts the numbers so the exponents are the same, then adds the values. But 257^10-2 added to 100*10^3 becomes 0*10^3 or 0.0 when adjusted, so 2.57 + 100000.00 equals 100000.00.
To help mitigate that loss of precision, you need to sort your numbers from smaller to larger before you add them, in hopes that the small numbers will add up to something large enough to avoid completely disappearing when you add the sum to the larger numbers.
There is a discipline dedicated those kind of painful problems, called numerical analysis. Unfortunately, I am supremely unqualified to talk about that discipline, so I shall just quote another Famous Computer Scientist, Edsger W. Dijkstra: End of digression.
The bottom line here is that you need to sort your pile of checks before you can add them together, and you need to know how long it will take to sort them to decide whether you can buy your new bird today or not.
At this point, as a Famous Computer Scientist myself, I am going to use a very snazzy computer science trick to avoid having to continually talk about checks and 747s: abstraction.  I will throw away all the irrelevant details, including the 747 dealer (who wants to go home now anyway) and in fact both the numbers and the addition operation, and state the problem this way: you have a list of things that you need to sort, and you want to know how long it will take. The only operation you have on the things is to compare two and ask, "Is this one smaller than that one?"
Sorting is a fundamental computer science problem and is well studied. There are many, many algorithms known for sorting. (The algorithm is fundamental to computer science. If you have a program, it is made up of code written by a programmer. The code implements one or, more likely, many algorithms. An algorithm is a high-level, general, abstract description of a process, while the code is the details; the low-level, concrete, specific description of the process plus a huge bundle of other trivia.)
Each of the algorithms has its own behavior, and its own speed. But what does it mean to talk about the speed of an algorithm, which is a thing of the mind and does not do anything? That is where the "Big-Oh" notation comes in.
For every algorithm, any operation used in that algorithm, and any input given to the algorithm, there is a mathematical expression that describes how many times the operation is used on that input by that algorithm. Consider a simple algorithm, going through a list of 12 numbers counting how many there are in the list, one by one. If the operation is looking at a number to determine whether it is the last one in the list, then the operation is used 12 times by that algorithm on that input.
The expression normally depends on the size of the input: if that simple algorithm is given a list of length n (the variable traditionally used for such things), then it will use that operation n times.
Unfortunately, very often the mathematical expression is complex, not very helpful, and very unenlightening. The big-oh notation is designed to highlight the most important part of the expression while hiding irrelevant details. It describes the asymptotic behavior of the expression as the inputs of the expression get larger. In this case, that means that it describes the behavior of the algorithm as the size of the inputs to it get bigger.
Take, for example, the expression 37n^3 + 12n^2 + 19. If n is 1, that equals 37+12+19 = 68. If n is 10, it equals 37*1000 + 12*100 + 19 = 37000 + 1200 + 19. Clearly, as n gets bigger, the final 19 becomes irrelevant. Less clearly, 12n^2 component also becomes irrelevant, because it will be dwarfed by the first. Finally, the 37 is also irrelevant since 37 times a huge number is just another huge number. The asymptotically important part of the expression is the n^3 element. So, 37n^3+12n^2+19 is O(n^3), pronounced "order of n-cubed".
The "order of" an expression describes a kind of upper bound for the expression. It is defined as an expression which, when multiplied by some constant, is greater than or equal to the original expression for all input values larger than some other constant. Because we can choose the new expression, we can pick one that is simpler than the original in the same way that n^3 is simpler than 37n^3+12n^2+19.
(By the way, in case you were wondering, when n = 13, 37n^3+12n^2+19 equals 83336, while 38*n^3 equals 83486. So, n^3 multiplied by 38 is greater than the original expression for all input values greater than n=12. Hence, 37n^3+12n^2+19 is indeed O(n^3).)
The behavior of the counting algorithm is O(n), "order of n". How does that describe the speed of the algorithm? If each operation takes a finite (and by assumption, constant) time, then the time taken by any implementation of the algorithm will depend on the size of the input primarily by n times the time taken by the operation (plus some constant factors here and there). The Big-Oh notation provides a way of comparing two algorithms; for a sufficiently large value of n, n^2 will be greater than 10000n, and thus an O(n^2) algorithm will be slower than an O(n) algorithm for any sufficiently large input. There is a traditional hierarchy of algorithms:
- O(1) is constant-time; such an algorithm does not depend on the size of its inputs.
- O(n) is linear-time; such an algorithm looks at each input element once and is generally pretty good.
- O(n log n) is also pretty decent (that is n times the logarithm base 2 of n).
- O(n^2), O(n^3), etc. These are polynomial-time, and generally starting to look pretty slow, although they are still useful.
- O(2^n) is exponential-time, which is common for artificial intelligence tasks and is really quite bad. Exponential-time algorithms begin to run the risk of having a decent-sized input not finish before the person wanting the result retires.
There are worse; like O(2^2^...(n times)...^2).
How does that apply to sorting? Sorting, as a problem, is clearly at least as bad as O(n), since it has to look at each item, but that is just as clearly not a good estimate. Can a better bound be found? What do the well-known sorting algorithms do?
The best sorting algorithms, going by names like "heapsort" and "quicksort", are O(n log n). But that does not necessarily mean those are the absolute best algorithms. There might well be a better one yet to be discovered. 
So, is there a way to find the performance of any sorting algorithm? In this case, the answer is yes.
Going back to our operation, comparison, it takes two items and says either "yes, this one is smaller than that", or "no, this one is not smaller than that". It is a binary comparison; it produces one of two possible answers.
A sorted list is a permutation of the original list; it is the same list with the elements rearranged. For a list of n elements, there are n! possible permutations; that is n * (n-1) * (n-2) * ... * 1. The question becomes, how many comparisons do you need to pick out one specific permutation out of the n! possibilities?
Each comparison, because it is binary, reduces the possibilities by half. So, the answer to the question is a number of comparisons, c, such that 2^c >= n!. Solving for c, that is
c >= log n! = log (n * (n-1) * ... * 1) = log n + log (n-1) + ... + log 1
The last expression is of the order of
log n + log (n-1) + ... + log (n/2)
which is of the order of
n/2 * log (n/2)
which is of the order of n log n. 
Ultimately, using only comparisons, sorting is O(n log n); any fewer comparisons would not be able to pick out a single permutation from the possibilities. 
Heapsort is therefore about as good as sorting algorithms are going to get; anything better will be only an incremental improvement. (Quicksort has bad behavior for certain inputs; for example, using quicksort on an already sorted input is O(n^2).)
A creative part of computer science is often that changing the problem, or adding information, allows dramatic improvements. For example, if I can change the problem to adding a single new element to an already sorted list while keeping it sorted, I can easily find an O(n) algorithm; simply taking the new item on the end and re-sorting would be foolish when I could just go down the list and identify the spot where the new item goes.
(That might give you an idea for beating the problem. Taking one item as a list, it as already sorted. Adding another item to the list is O(n). Adding a third is also O(n). But the resulting algorithm, called "insertion sort", for creating a sorted list from a list of n items is quite bad: Using an O(n) algorithm n times is O(n^2).)
Physics often says fundamental things about the universe. Mathematics is "the queen of the sciences" (even though it is not a science) because it frequently makes final, absolute, dramatic statements. But here is one from computer science: sorting using only basic comparisons is a fundamental problem. (Have you ever tried to find a word in an unsorted dictionary?) The shortest possible time it takes to sort n items is of the order of n log n. And, the heapsort algorithm, if it is not itself the best possible algorithm, is the neighbor of the best.
I am seeing a lot of traffic to this page, presumably from people interested in the O(e) notation. If that describes you, here are some good links to learn more, in a slightly more serious fashion:
- Big O notation from Wikipedia is reasonably complete, reasonably correct (as far as I can see), and reasonably complex. Probably the exact opposite of the "Wikipedia is a wasteland of pop culture" stereotype.
- Plain English Explanation of Big O Notation is more readable and reasonably complete.
- Big O notation from CS Animated. What can I say, it's animated. Actually, it does have a very good illustration of the process of reducing the total number of operations identified in a block of code to a single, simplified Big O (or in that case, the related Big Theta) expression, as well as a thorough discussion of the ideas, aimed at budding computer scientists.
- Check out P and NP for similar entertainment.
fib 0 = 1 fib 1 = 1 fib n = fib (n-1) + fib (n-2)
the time complexity of computing fib n is O(fib n).
I originally wrote this article not to discuss the Big O, but to highlight the properties of the sort algorithm. Algorithm analysis is complicated and interesting on its own. The most entertaining example that I know about is from an article by Jon Bently, collected in one of the Programming Pearls books if I recall correctly. He showed two algorithms for reversing an array of things with the same algorithmic complexity, and then showed that one was much, much, faster than the other because the slower was largely pessimal in regard to page-based virtual memory.
If you are interested in floating point numbers, check out Anatomy of a floating point number or the cannonical What Every Computer Scientist Should Know About Floating-Point Arithmetic .
|||To save time, I will skip the joke (physics, I think) and just give you the punchline: "Assume that each cow is a perfect sphere of uniform density...."|
|||No, I am not referring to the manga or anime series on Cartoon Network. And, no, I do not understand it either.|
|||I have taken this argument from Bruce Mills, Theoretical Introduction to Programming.|
Is it possible to do better by not using comparisons? Yep. I recently ran across a brilliant example of this: Intelligent Design sort. Say you have a deck of 52 playing cards shuffled into a random order that you wish to sort. The probability that you could pick the three of Diamonds first, followed by the Jack of Clubs, and so on (or whatever order the deck is in), is 1/52 * 1/51 * ... * 1, or 1/52!. 52! is
according to my copy of bc. Clearly, the likelihood of this ordering is far too miniscule to happen by chance; a Higher Power must have arranged it. How can we, as mere mortals, improve on such? Therefore, the deck is already sorted. Unfortunately, like most things Intelligent Design, this sort algorithm has no useful properties.
For a better example, suppose that you have many, many decks of cards mixed together in a big heap, and that you want to sort them all by suit and value. There are only 52 possible combinations of face and value: Ace of Hearts, four of Spades, and so forth; every Ace of Hearts is equivalent to every other Ace of Hearts. You arrange 52 bins, examine each card once, and toss it into the appropriate bin. This algorithm, bin sort, sorts the heap in O(n), and can do so because it involves no comparisons between any two cards.
|||For an interesting discussion (with graphs) of the difference between n log n and n algorithms, see O(N log N) Complexity - Similar to linear? on stackoverflow. (And yes, I did give up on keeping the notes in order in the text.)|