How to detect repeating digits in an infinite sequence? I tried Floyd & Brent detection algorithm but come to nothing... I have a generator that yields numbers ranging from 0 to 9 (inclusive) and I have to recognize a period in it.
Example test case:
import itertools
# of course this is a fake one just to offer an example
def source():
return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 1))
>>> gen = source()
>>> period(gen)
(1, 0, 1, 4, 8, 2, 1, 3, 3, 1)
Since your sequence is not of the form Xn+1 = f(Xn), Floyd's or Brent's algorithms are not directly applicable to your case. But they may be extended to do the task:
k
).k
elements of the sequencek
-element subsequence.k
, updatek
and continue with the step 3.If the sequence cycling starts from the first element, ignore step 1 and start from step 2 (find next sequence element equal to the first element).
If the sequence cycling does not start from the first element, it is possible that Floyd's or Brent's algorithm finds some repeating element of the sequence that does not belong to the cycle. So it is reasonable to limit the number of iterations in steps 2 and 4, and if this limit is exceeded, continue with the step 1.
I have no idea about proper algorithms to apply here, but my understanding also is that you can never know for sure that you've detected a period if you have consumed only a finite number of terms. Anyway, here's what I've come up with, this is a very naive implementation, more to educate from the comments than to provide a good solution (I guess).
This one, however, "forgets" the initial order and returns a tuple that is a cyclic permutation of the actual period:
To compensate for this, you'll need to keep track of the offset.
Empirical methods
Here's a fun take on the problem. The more general form of your question is this:
The process to determine the repeating frequencies is known as the Fourier Transform. In your example case the signal is clean and discrete, but the following solution will work even with continuous noisy data! The FFT will try to duplicate the frequencies of the input signal by approximating them in the so-called "wave-space" or "Fourier-space". Basically a peak in this space corresponds to a repeating signal. The period of your signal is related to the longest wavelength that is peaked.
This gives the correct answer for this case,
10
though ifN
didn't divide the cycles cleanly, you would get a close estimate. We can visualize this via:Evgeny Kluev's answer provides a way to get an answer that might be right.
Definition
Let's assume you have some sequence
D
that is a repeating sequence. That is there is some sequenced
of lengthL
such that:D_i = d_{i mod L}
, wheret_i
is thei
th element of sequencet
that is numbered from 0. We will say sequenced
generatesD
.Theorem
Given a sequence
D
which you know is generated by some finite sequencet
. Given somed
it is impossible to decide in finite time whether it generatesD
.Proof
Since we are only allowed a finite time we can only access a finite number of elements of
D
. Let us suppose we access the firstF
elements ofD
. We chose the firstF
because if we are only allowed to access a finite number, the set containing the indices of the elements we've accessed is finite and hence has a maximum. Let that maximum beM
. We can then letF = M+1
, which is still a finite number.Let
L
be the length ofd
and thatD_i = d_{i mod L}
fori < F
. There are two possibilities forD_F
it is either the same asd_{F mod L}
or it is not. In the former cased
seems to work, but in the latter case it does not. We cannot know which case is true until we accessD_F
. This will however require accessingF+1
elements, but we are limited toF
element accesses.Hence, for any
F
we won't have enough information to decide whetherd
generatesD
and therefore it is impossible to know in finite time whetherd
generatesD
.Conclusions
It is possible to know in finite time that a sequence
d
does not generateD
, but this doesn't help you. Since you want to find a sequenced
that does generateD
, but this involves amongst other things being able to prove that some sequence generatesD
.Unless you have more information about
D
this problem is unsolvable. One bit of information that will make this problem decidable is some upper bound on the length of the shortestd
that generatesD
. If you know the function generatingD
only has a known amount of finite state you can calculate this upper bound.Hence, my conclusion is that you cannot solve this problem unless you change the specification a bit.