Let those datatypes represent unary and binary natural numbers, respectively:
data UNat = Succ UNat | Zero
data BNat = One BNat | Zero BNat | End
u0 = Zero
u1 = Succ Zero
u2 = Succ (Succ Zero)
u3 = Succ (Succ (Succ Zero))
u4 = Succ (Succ (Succ (Succ Zero)))
b0 = End // 0
b1 = One End // 1
b2 = One (Zero End) // 10
b3 = One (One End) // 11
b4 = One (Zero (Zero End)) // 100
(Alternatively, one could use `Zero End` as b1, `One End` as b2, `Zero (Zero End)` as b3...)
My question is: is there any way to implement the function:
toBNat :: UNat -> BNat
That works in O(N)
, doing only one pass through UNat?
To increment a binary digit, you have to flip the first zero at the end of your number and all the ones preceding it. The cost of this operation is proportional to the number of 1 at the end of your input (for this your should represent number as right-to-left list, eg. the list [1;0;1;1] codes for 13).
Let a(n) be the number of 1 at the end of n:
a(n) = 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, ...
and let
s(k) = a(2^k) + a(2^k+1) + ... + a(2^(k+1)-1)
be the sum of elements between two powers of 2. You should be able to convince yourself that s(k+1)=2*s(k) + 1 (with s(0) = 1) by noticing that
a(2^(k+1)) ..., a(2^(k+2) - 1)
is obtained by concatenating
a(2^k) + 1, ..., a(2^(k+1) - 1) and a(2^k), ..., a(2^(k+1) - 1)
And therefore, as a geometric series, s(k) = 2^k - 1.
Now the cost of incrementing N times a number should be proportional to
a(0) + a(1) + ... + a(N)
= s(0) + s(1) + s(2) + ... + s(log(N))
= 2^0 - 1 + 2^1 -1 + 2^2-1 + ... + 2^log(N) - 1
= 2^0 + 2^1 + 2^2 + ... + 2^log(N) - log(N) - 1
= 2^(log(N) + 1) - 1 - log(N) - 1 = 2N - log(N) - 2
Therefore, if you take care of representing your numbers from right-to-left, then the naive algorithm is linear (note that you can perform to list reversal and stay linear if you really need your numbers the other way around).
I like the other answers, but I find their asymptotic analyses complicated. I therefore propose another answer that has a very simple asymptotic analysis. The basic idea is to implement divMod 2
for unary numbers. Thus:
data UNat = Succ UNat | Zero
data Bit = I | O
divMod2 :: UNat -> (UNat, Bit)
divMod2 Zero = (Zero, O)
divMod2 (Succ Zero) = (Zero, I)
divMod2 (Succ (Succ n)) = case divMod2 n of
~(div, mod) -> (Succ div, mod)
Now we can convert to binary by iterating divMod
.
toBinary :: UNat -> [Bit]
toBinary Zero = []
toBinary n = case divMod2 n of
~(div, mod) -> mod : toBinary div
The asymptotic analysis is now pretty simple. Given a number n
in unary notation, divMod2
takes O(n) time to produce a number half as big -- say, it takes at most c*n
time for large enough n
. Iterating this procedure therefore takes this much time:
c*(n + n/2 + n/4 + n/8 + ...)
As we all know, this series converges to c*(2*n)
, so toBinary
is also in O(n) with witness constant 2*c
.
If we have a function to increment a BNat
, we can do this quite easily by running along the UNat
, incrementing a BNat
at each step:
toBNat :: UNat -> BNat
toBNat = toBNat' End
where
toBNat' :: BNat -> UNat -> BNat
toBNat' c Zero = c
toBNat' c (Succ n) = toBNat' (increment c) n
Now, this is O(NM)
where M
is the worst case for increment
. So if we can do increment
in O(1), then the answer is yes.
Here's my attempt at implementing increment
:
increment :: BNat -> BNat
increment = (reverse End) . inc' . (reverse End)
where
inc' :: BNat -> BNat
inc' End = One End
inc' (Zero n) = One n
inc' (One n) = Zero (inc' n)
reverse :: BNat -> BNat -> BNat
reverse c End = c
reverse c (One n) = reverse (One c) n
This implementation is O(N)
because you have to reverse
the BNat
to look at the least significant bits, which gives you O(N)
overall. If we consider the BNat
type to represent reversed binary numbers, we don't need to reverse the BNat
, and, as @augustss says, we have O(1), which gives you O(N) overall.