I was reading "Programming Pearls" and I am really confused in one of the solution explanations.
The question was: "A file containing at most n positive integers, each less than n, where n = 10^7. Each positive integer could appear at most ten times. How would you sort the file?"
Given solution in the book: " If each integer appears at most ten times, then we can count its occurence in a four-bit half byte. Using the solution to Problem 5 (below) we can sort the complete file in a single pass with 10,000,000/2 bytes, or in k passes with 10,000,000/2k bytes"
Solution to problem 5 is: A two-pass algorithm first sorts the integers 0 through 4,999,999 using 5,000,000/8 = 625,000 words of storage, then sorts 5,000,000 through 9,999,999 in a second pass. A k-pass algorithm sorts at most n non-repeated positive integers less than n in time kn and space n/k.)
I am not getting how author is coming to 10,000,000/2k space to sort. I mean, based on the solution to problem 5, first we need 625K bytes of space to sort in first pass and additional 1/2 byte per integer to store the count right?
Could someone please help me understand this?
Thanks a lot.
You can use 4 bits per counter for this, not 2 bytes. If you group counters you can even lower this value, for example if you group 3 counters, that's 10*10*10=1000 combinations and you need 10 bits (=1024 values) for that.
10,000,000 - because there are 10,000,000 possible values.
2 - because each byte consists of two half-bytes.